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Abstract 

Collaborative filtering recommender systems (CFRSs) are the key compo¬ 
nents of successful e-commerce systems. Actually, CFRSs are highly vulner¬ 
able to attacks since its openness. However, since attack size is far smaller 
than that of genuine users, conventional supervised learning based detection 
methods could be too “dull” to handle such imbalanced classification. In this 
paper, we improve detection performance from following two aspects. First, 
we extract well-designed features from user profiles based on the statistical 
properties of the diverse attack models, making hard classification task be¬ 
comes easier to perform. Then, refer to the general idea of re-scale Boosting 
(RBoosting) and AdaBoost, we apply a variant of AdaBoost, called the re¬ 
scale AdaBoost (RAdaBoost) as our detection method based on extracted 
features. RAdaBoost is comparable to the optimal Boosting-type algorithm 
and can effectively improve the performance in some hard scenarios. Finally, 
a series of experiments on the MovieLens-lOOK data set are conducted to 
demonstrate the outperformance of RAdaBoost comparing with some classi¬ 
cal techniques such as SVM, kNN and AdaBoost. 
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1. Introduction 

Personalization recommender systems (RSs) utilize a variety of recom¬ 
mendation methods to suggest products that users may like, such as movies, 
music, news, books and other products. Collaborative filtering recommender 
systems (CFRSs) have been proved to be one of the most successful RSs used 
by many e-commerce companies such as Amazon, Ringo, eBay, GroupLens 
etc. 0.0,0, jg. In practice, CFRSs are prone to manipulation from at¬ 
tackers since its openness. Typically, attackers carefully inject chosen attack 
profiles into CFRSs in order to bias the recommendation results to their ben¬ 
efits, which is termed “shilling” or “profile injection” attacks. It decreases 
the trustworthiness of recommendation and leads to a negative impact on 
the CFRSs. Thus, constructing an effective method to defend the attackers 
and remove them from the CFRSs is crucial. 

Supervised learning based detection method for “shilling” or “profile in¬ 
jection” in CFRSs is an important research direction, which regards the 
detection attributions as the classification features and distinguishes attack 
profiles from genuine profiles by constructed features. Actually, the attack 
detection problem can be formulated as an imbalanced classification. The 
number of attackers is far smaller than genuine users in CFRSs, especially 
when attack size is small. However, the traditional supervised learning 
(i.e., SVM and kNN) based attack detection methods often inevitably have 
individual weaknesses for handling this kind of issues and fail to effectively 
capture the concerned attackers. 

In the current paper, we aim to improve detection performance from two 
aspects. Firstly, we consider the overall statistical signature of attack profiles 
would differ significantly from that of genuine profiles. The difference comes 
from two sources: the distribution of ratings (or items among the filler items 
or selected items) and the ratings of the target items. Based on the statis¬ 
tical properties of the diverse attack models, as many extracted features as 
possible are designed and used to transform their “inputs”, distorting the 
space so that the task (i.e., classification or clustering) becomes easier to 
perform. Specifically, we extract as many as 18 features from user profiles 
(consists of attack profiles and genuine profiles) to construct a sophisticated 
features representation for each user to make it much more easily classified. 
Secondly, refer to the general idea of re-scale Boosting (RBoosting) [T7] . 


^The ratio between the number of attackers and genuine users. 
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|2n] and AdaBoost puni, we apply a variant of Boosting algorithm, called 
the re-scale AdaBoost (RAdaBoost) as our detection method based on ex¬ 
tracted features. RBoosting is theoretically and experimentally proved to be 
better than the classical Boosting algorithm d!]. Furthermore, the theoret¬ 
ical near optimality of the numerical convergence of RBoosting among all 
the variants of the Boosting-type algorithms was also specified. This means 
that if the parameter is appropriately selected, RBoosting is comparable to 
the optimal Boosting-type algorithm. And AdaBoost O [10] is one of the 
most popular ensemble techniques paradigm and has been shown to be very 
effective in practice in some hard scenarios [IH]. Typically, AdaBoost em¬ 
ploys re-weighted loss function for gradually increasing emphasis (or weights) 
on misclassifications (i.e., concerned attackers) and can distinctly improve 
the predictive performance on a difficult data set. Thus, with the help of 
the re-scale operator, RAdaBoost can be used in conjunction with many 
other types of learning algorithms (or weak learners) to improve the perfor¬ 
mance in “shilling” attacks detection. Finally, a series of experiments on the 
MovieLens-lOOK dataset are conducted to demonstrate the outperformance 
(i.e., classification error, detection rate and false alarm rate) of RAdaBoost 
comparing with conventional classification techniques such as SVM, kNN 
and the original non-rescale AdaBoost version. The experimental results 
show that RAdaBoost can effectively improve the performance. 

The rest of paper is organized as follows. In Section 2, we give a brief 
introduction to the related work. In Section 3, we give a brief introduction of 
attack profiles and attack models. In Section 4, our approach are described 
in details. In Section 5, experimental results are reported and analyzed. In 
the last section, we conclude the paper with a brief summary and prospect 
the directions of future works. 

2. Related work 

Existing work in this area have focused on detecting and preventing the 
“shilling” attacks (or “profile injection” attacks). Burke et al. [3| proposed 
and studied several attributions derived from user profiles for their utility in 
attack detection. They employed the kNN classifier as their detection ap¬ 
proach. But it is unsuccessful when detecting attacks with small filler size 
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1^ Then, Williams et al. [2S], |2S] tried to extract features from user pro¬ 
files and utilized them to detect shilling attacks. They also suffered from 
low detection accuracy and many genuine profiles are misclassihed as attack 
profiles. After that, He et al. na introduced the rough set theory into 
shilling attacks detection by means of taking features of user profiles as the 
condition attributes of the decision table. However, their method faced with 
the low overall classification rate in some cases, especially for bandwagon 
attack. Afterwards, Wu et al. [28] proposed a hybrid detection method to 
detect shilling attacks, which combined the naive Bayesian classifiers and 
augmented expectation maximization based on several selected metrics. Re¬ 
gretfully, their technique also suffered from low F-measure [S] when the filler 
size is small. Zhang et al. [30] introduced the idea of ensemble learning 
for improving predictive capability in the attack detection problem. They 
constructed the base-classifiers (or weaker learner) with the Support Vector 
Machine (SVM) approach and then integrated them to generate a high pre¬ 
dictive ability learner for detection. Their proposed method exhibited better 
performance than some benchmarked methods. Nevertheless, it still suffered 
from low precision especially when the attack size is small. In addition, the 
same authors Zhang et al. [31] also proposed an online method, HHT-SVM, 
to detect profile injection attacks by combining Hilbert-Huang transform 
(HHT) and support vector machine (SVM). They created rating series for 
each user profile based on the novelty and popularity of items in order to 
provide basic data for feature extraction. The precision of their method 
shown better than the benchmarked methods, but the precision significantly 
decreased with the filler size increased. 

Generally speaking, previous studies showed that the detection results 
of “shilling” attacks is dissatisfactory and leave much to be desired, espe¬ 
cially when the filler size or attack size is small. In the current work, we 
intend to improve the detection performance from two aspects. First, we 
introduce more well-designed features to depict the distinction between at¬ 
tack profiles and genuine profiles to make hard classification task (i.e., with 
small filler size and attack size) becomes easier to perform. Secondly, in 
view of conventional classification techniques could be inadequate to handle 
such imbalanced classification, particularly when the attack size is small, we 


^The ratio between the number of items rated by user u and the number of entire items 
in the recommender system. 
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applied a variant of AdaBoost algorithm, called the re-scale AdaBoost (RAd- 
aBoost) as our detection method. RAdaBoost gradually increases emphasis 
on the concerned attacks and can distinctly improve the performance for the 
imbalanced classification task. 

3. Attack profiles and attack models 

The attackers have different attack intents to bias the recommendation 
results for their benefits. In the literature, “shilling” attacks are classified 
into two ways: nuke attack and push attack Q. ini. [25] . In nuke attacks, 
attackers demote the target items by rating the lowest score, whereas in push 
attacks, attackers promote the target items by rating the highest score. In 
order to effectively “nuke” or “push” a target item, the attacker should clearly 
know the form of the attack profiles. The general form of attack profiles is 
shown in Table 1. The details of the four sets of items are described as 
follows: 

It', a set of target items with singleton or multiple items, called single¬ 
target attack or multi-target attack. The rating is li'ij), generally rated the 
maximum or minimum value in the entire profiles. 

Is'. The set of selected items with specified rating by the function a{if) 

| 32 ]; 

Ip: A set of filler items, received randomly items with random assigned 
ratings 

In'. A set of items with no ratings; 

In the present work, we utilize 14 attack models to generate attack pro¬ 
files. The involved attack profiles and corresponding explanations are listed 
in Table 2. The details of these attack models are described as follows: 

1) Random attack: Is = 4> p{i) ~ N(f,a^) |HS]- 

2) Average attack: Is = (f> and p{i) ~ Nirl^'af) [33] . 


Table 1: General form of attack profiles. 
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Table 2: Attack models summary. 


Attack 

Is 

If 

Ur 

It 

Models 

Items 

Rating 

Items 

Rating 

J-N 

push/nuke 

Random 

null 

randomly chosen 

normal dist around 

system mean. 

null 

r max 1^ min 

Average 

null 

randomly chosen 

normal dist around 

item mean. 

null 

r max h min 

Bandwagon (average) 

popular items 

r max 1^ min 

randomly chosen 

normal dist around 

item mean. 

null 

r max 1^ min 

Bandwagon (random) 

popular items 

r max/r min 

randomly chosen 

normal dist around 

system mean. 

null 

r max 1^ min 

Segment 

segmented items 

r max min 

randomly chosen 

r min 1^ max 

null 

max /r min 

Reverse Bandwagon 

unpopular items 

rminlrmax 

randomly chosen 

system mean 

null 

r max/r min 

Love/Hate 

null 

randomly chosen 

min 1'^ max 

null 

r max/r min 

AOP 

null 

x-% popular items, ratings set with 
normal dist around item mean. 

null 

r max/r min 

PIA-AS 

power items, ratings set with normal 

dist around item mean. 

null 

null 

r max/r min 

PIA-ID 

power items, ratings set with normal 

dist around item mean. 

null 

null 

r max 1^ min 

PIA-NR 

power items, ratings set with normal 

dist around item mean. 

null 

null 

r max h min 

PUA-AS 

copy ratings and items from power 
user profiles. 

null 

null 

r max h min 

PUA-ID 

copy ratings and items from power 
user profiles. 

null 

null 

r max h min 

PUA-NR 

copy ratings and items from power 
user profiles. 

null 

null 

r max h min 


3) Bandwagon (average) attack: Is contains a set of popular items. And 
then, we use these items as Is, <j{i) = Tmax/'i'min (push/nuke) and p{i) ~ 

m 

4) Bandwagon (random) attack: Is contains a set of popular items, a{i) = 
Tmaxlrmin (push/uuke) and p{i) ~ N{f,a‘^) [27]. 

5) Segment attack: Is contains a set of segmented items. And then, we 
use these items as Is, cr{i) = r max! r min (push/nuke) and p{%) = Tminlfmax 
(push/nuke) [T^ . 

6) Reverse Bandwagon attack: Is contains a set of unpopular items, 
= rminirmax (push/uuke) and p{i) ~ N{f,a‘^) [I2]. 

7) Love/Hate attack: Is = <t> and p{i) = Tmin/^max (push/nuke) [T^ . 

8) AOP attack: A simple and effective strategy to obfuscate the Average 
attack is to choose filler items with equal probability from the top x% of 
most popular items rather than from the entire collection of items [22] . 
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9) PIA-AS attack: The top-N items with the highest aggregate similarity 
(AS) scores become the selected set of power items. This method requires at 
least 5 users who have rated the same item i and item j [22] . 

10) PIA-ID attack: Based on In-Degree centrality, power items partic¬ 
ipate in the highest number of similarity neighborhoods. For each item i 
compute similarity with every item j applying significance weighting 
where fidj is the number of users that have rated the same items i and j, 
then discard all but the top-N neighbors for each item i. Count the number 
of similarity scores for each item j, and select the top-N item j’s [22] • 

11) PIA-NR attack: Power items are the items with the highest number 
of user ratings. We select the top-N items based on the total number of user 
ratings they have in their profile |22j . 

12) PUA-AS attack: The top 50 users with the highest Aggregate Simi¬ 
larity scores become the selected set of power users. This method requires at 
least 5 co-rated items between user u and user v and does not use significance 
weighting [2T] . 

13) PUA-ID attack: Based on the In-Degree centrality concept from so¬ 
cial network analysis, power users are those who participate in the highest 
number of neighborhoods. For each user u compute its similarity with every 
other user v applying significance weighting, then discard all but the top 50 
neighbors for each user u. Count the number of similarity scores for each 
user V and select the top 50 user u’s [2T] . 

14) PUA-NR attack: Power users are the users with the highest number 
of ratings. We selected the top 50 users based on the total number of ratings 
they have in their user profile izu. 

4. Our approach 

In this section, we first present an overall introduction of our approach. 
Then, the two main aspects of our work including features extraction from 
user profiles and RAdaBoost for attack detection are described in detail. 

4 . 1 . The framework of our approach 

As shown in Figure 1, our approach consists of four phases: the phase of 
constructing training dataset and test datasets, the phase of feature extrac¬ 
tion, the phase of training classifier via RAdaBoost, and the phase of test for 
generating detection results. At the phase of constructing training set and 
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test sets, the data sets are constructed by attack profiles (diverse attack mod¬ 
els are injected) and genuine profiles. Concretely, for training data set, we 
use several representative attack models such as Random, Average attacks 
etc. to generate mixed attack profiles. Specially, we modest increase the 
number of attacks (160 attackers for each attack models) when constructing 
the training data set aim to relieve the extent of imbalance in training phase 
(more details in section 5). Then, we combine them with genuine profiles as 
the our training data set. For test data sets, attack profiles with different 
filler sizes and attack sizes are inserted into the genuine profiles to form the 
test data sets (see section 5). At the phase of feature extraction, we employ 
18 features (more details in the next subsection) extracted from user profiles 
to characterize a feature representation (or feature vector) for each user in 
both training data set and test data sets. At the phase of training, we use 
RAdaBoost to train a strong composite estimator (or classifier) based on 
training features. Finally, we use features retrieved from test data sets as the 
input into the obtained trained estimator and generate detection results at 
the phase of testing. 

A A Feature extraction from user profiles 

Previous works jsiEniEniEn] summarized different metrics to character¬ 
ize the features extracted from user profiles. These features generally fall 
into two types: generic and type-specific features. The generic features are 
basic descriptive statistics that attempt to discriminate between attack pro¬ 
files and genuine profiles and the type-specific features are implemented to 
detect characteristics of profiles generated by specific attack models or spe¬ 
cific signatures of attacks. In the present work, we employ 10 features from 
these two types. Besides, we also employ 5 features based on the filler size 
|3T] and propose additional 3 new features which measure the distribution of 
specific rating such as mean rating, maximum rating and minimum rating in 
filler items for each user. 

4 . 2 . 1 . Generic features 

Attack profiles usually have high deviation from the mean value for the 
target items and low deviation from the mean value for remaining items. 
Thus, generic features such as RDMA, WDMA etc. are often used to measure 
the deviation of rating for user profiles PESlEniEl!. 
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Figure 1: The framework of our approach. 


Rating Deviation from Mean Agreement (RDMA): 

RDMAu = (4.1) 

where Nu is the number of ratings that user u has rated and NRi is the 
number of ratings provided for item i. denotes the rating given by user 
u to item i, fi denotes the mean rating of item i across all users. 

Weighted Deviation from Mean Agreement (WDMA): 


WDMAu = 


E Nu \ru,i-'^i\ 
i=0 NR^ 


Nu 


Weighted Degree of Agreement (WDA): 


Nu 

WDAu = Y, 

i=0 


ru,i - Tj 

NRi 


Length Variance (LengthVar): 


LengthV aVu 


\nu - n\ 

T,keu - "f 


(4.2) 


(4.3) 


(4.4) 


where n„ is the total number of ratings in the system for user u. U is the 
total number of users in the system, n is the average length of a profile in 
the system. 
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4-2.2. Type-specific features 

Model-based methods assume that we have some prior knowledge about 
the attack models. Based on an assumed model, ratings can be automatically 
divided into filler items and selected items [HI ESI ESI EH- Therefore, the 
measurements such as MeanVar, FMTD etc. can be calculated from each 
subset to measure the authenticity of profiles. 

Mean Variance (MeanVar): 


MeanVaVu = 


E 


'j^Pu,F 




u.F I 



(4.6) 


where Pu^f is the rest of the profile: Pu—Pu,t, Pu,t = {i ^ Pu, such that = 
'I'max} (or 'I'min for nukc attack), Pu is the profile of user u. 

Filler Mean Target Difference (FMTD): 


FMTD^ = 


E, 


i^Px 


u,T 


\P.a 


E 


k^Px 


u,F 


'^u.k 


\Pr 


U.F I 


(4.6) 


Target Model Focus (TMF): 


TMFu — max Fj (4-7) 

j&Pr 

where Fj = (E„en ^«,i)/(E„en I^«,t|), and is 1 if z G Pu,t, 0 otherwise. 
Pt denotes the item set of potential targets [2S| • 

Filler Mean Variance (FMV): 

FMV„ = 

' “ 'zet/f- 

where is the partition of the profile of user u hypothesized to be the set of 
filler items F by model m. | is the number of items in the hypothesized 
filler partition of profile by model m. 

Filler Mean Difference (FMD): 

^ \Uu\ 

= ■—-f-| (4.9) 

I «l j=i 

where Uy is the partition of the profiles of user u. |?7„| is the number of the 
profiles of user u. 
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Filler Average Correlation (FAC): 


FACu = 



where is the set of items rated by user u. 


(4.10) 


4 . 2 . 3 . Features based on the filler size 

User prohles with different number of ratings will generate different fea¬ 
tures. Similarly, the number of rating on different types of items will also 
generate different features. Such as FSTI, FSPI etc. 

Filler Size with Total Items (FSTI): The ratio between the number of 
items rated by user u and the number of entire items in the recommender 
system [3T]. 

FSTIu = ( 4 - 11 ) 


where / is the set of items in the system. |/| denotes the total number of 
items in the system. 0{ru,i) is 1 if user u rated item i, 0 otherwise. 

Filler Size with Popular Items (FSPI): The ratio between the number of 
popular items rated by user u and the number of entire popular items in the 
recommender system |31j . 

FSPIu = (4.12) 

K 

where K denotes the boundary point of popular items and unpopular items. 

Filler Size with Popular Items in Itself (FSPII): The ratio between the 
number of popular items rated by user u and the number of entire items 
rated by user u m 

FSPIh = U,T‘ (4.13) 

Eh,0(r,,) 

Filler Size with Unpopular Items (FSUI): The ratio between the number 
of unpopular items rated by user u and the number of entire unpopular items 
in the recommender system BB 


FSUI, 


Slh 0{ru.i) 

|/| - K 


(4.14) 
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Filler Size with Unpopular Items in Itself (FSUII): The ratio between the 
number of unpopular items rated by user u and the number of entire items 
rated by user u m 


^Uir+i 

Efcli 0{ru,k) 


(4.15) 


FSUIIu 


4.2.4- Our proposed features 

We propose 3 new features which focus on the number of specific ratings 
(such as the maximum score, minimum score and average score) on filler or 
selected items. Since attackers show different attack intents in CFRSs, the 
filler or selected set of attack profiles may be filled by specific items (i.e., 
select popular items for Bandwagon (average and random) attacks, select 
randomly items in the system for Random attack) with the highest score or 
the lowest score or average score. Take nuke attacks for example, the selected 
items or filler items are rated with maximum score in Reverse Bandwagon, 
Segment and Love/Hate attacks (as shown in Table 2). Similarly, the selected 
items or filler items are rated with minimum score in Bandwagon (average). 
Bandwagon (random) and Segment attacks. In Random attack, the filler 
items are rated with some average score (normal distribution around system 
mean). Therefore, the number of specific ratings can be used to evaluate 
partly the difference between genuine profiles and attack profiles. 

Filler Size with Maximum Rating in Itself (FSMAXRI): The ratio between 
the number of items rated by user u with maximum score and the number 
of entire items rated by user u. 



(4.16) 


FSMAXRIu = 


Eti 0{ru,k) 


where is the rating given by user u to item i, rmax is the maximum score 
in the system. /„ denotes the set of items rated by user u. 0(r„^j = Umax) is 
1 if user u rated item i with rating r^ax, 0 otherwise. 0 {ru,k) is 1 if user u 
rated item k, 0 otherwise. 

Filler Size with Minimum Rating in Itself (FSMINRI): The ratio between 
the number of items rated by user u with minimum score and the number of 
entire items rated by user u. 


FSMINRR 



(4.17) 


ELIi 0{ru,k) 
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where Tmin is the minimum score in the system. = Tmin) is 1 if user u 

rated item i with rating 0 otherwise. 

Filler Size with Average Rating in Itself (FSARI): The ratio between the 
number of items rated by user u with average score and the number of entire 
items rated by user u. 

FSARIu = ^ 

Sfc=i 0{ru^k) 

where Tavg is the average score in the system. = Vavg) is 1 if user u 

rated item i with rating Tavg-, 0 otherwise. 


4 . 3 . Re-scale AdaBoost for attack detection 

After the raw user profiles are transformed to a set of sophisticated fea¬ 
tures, an effective detection method based on these features for “shilling” 
attacks is crucial. As is known, the number of attackers is usually far smaller 
than genuine users in CFRSs, thus the supervised learning based attack de¬ 
tection can be formulated as an imbalanced classification, actually. Conven¬ 
tional supervised learning based detection method (i.e., SVM or kNN) often 
inevitably have individual weaknesses for handling this kind of issues. Under 
this circumstance, Boosting comes into our sights as it has been proved to 
be efficient when faced with some difficult scenarios as imbalanced classifi¬ 
cation [13]. In Boosting, weak learners are fitted iteratively to the training 
data, using appropriate methods to gradually increase emphasis on obser¬ 
vations modelled poorly by the existing collection of weak learners. More 
specifically, AdaBoost apply weights to the observations (or samples), em¬ 
phasising poorly modelled ones and gradually (or iteratively, more precisely) 
strengthening the correction of misclassifications. The following Algorithm 
interpret the main idea of AdaBoost [9] . 
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Algorithm 1 AdaBoost 

Step 1: (Initialization): Given data {(xi,yi) : i = where 

X E and y E { — 1, +1}, weights = ^ : i = 1,..., m} , dictionary 

T>n = {gi, ■ ■ ■ iPn}) iteration number T and /o E span(I)ji). 

Step 2: Find gt E T>n such that minimizes the weighted sum error 


et = Pri^u,w[gt{xi) ^ Vi] = 

i=l 


for misclassified samples. 
Step 3: Choose 




~ Q 


and update weights 


w, 


it) _ 


w- 


(t-i) 


GXp 


for all samples, where Zt = 2[et{l — is a normalization factor. 

Step 4: Add to ensemble ft = ft-i + c^tgt- 

Step 5: Increase t by one and repeat Step 2 and Step 3 if t < T. 


From a statistical view, AdaBoost also can be viewed as a form of ” Gradi¬ 
ent Boosting Machine” m- Gonsider a loss function in this case, a measure 
that represents the loss in predictive performance due to a sub-optimal model. 
Boosting is a numerical optimisation technique for minimising the loss func¬ 
tion by adding at each step a new weak learner that best reduces (steps down 
the gradient of) the loss function. Original gradient Boosting algorithm was 
proved to be consistent, which can be easily deduced by applying the method 
in [T] to [IHl Theoreml], however, a number of studies [TIIIHIEI! also showed 
that its approximation rate is far slower. The numerical convergence rate of 
Boosting lies in which is much slower than the mini¬ 
max nonlinear approximation rate Here and hereafter, t denotes 

the number of iterations, and Cq, Cq are absolute constants. 

Recently, Lin et al. na and Xu et al. [22] proposed a re-scale Boosting 
(RBoosting) to improve the performance of original gradient Boosting. Dif¬ 
ferent from the aforementioned strategies that focus on controlling the step- 
size of such as some existing variants like Regularized shrinkage Boosting 
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|H] , Regularized truncated Boosting |H21 , e-Boosting [Hj , they cheered a novel 
direction to improve the numerical convergence rate and consequently, the 
generalization capability of Boosting. The core idea is that if the approxi¬ 
mation (or learning) effect of the tth iteration is not good, then we regard ft 
to be too aggressive and therefore shrink it within a certain extent. By such 
an interesting modification, the optimal numerical convergence of RBoosting 
can be guaranteed. This means that, RBoosting is among the almost optimal 
nonlinear approximant and therefore, RBoosting may possess better learning 
performance than other Boosting-type algorithms. Based on the general idea 
of RBoosting, the re-scale AdaBoost (RAdaBoost) can be interpreted as the 
following Algorithm 2. 


Algorithm 2 Re-scale AdaBoost 

Step 1: (Initialization): Given data {{xi,yi) : i = l,...,m}, where 
X & and y G {—1, +1}, weights = ^ : i = 1,..., m} , dictionary 

T>n — • • •, l/n}; ^ Set of shrinkage degree where St = 2/(t-|- 

u),M G N, iteration number T and /o G span(I?^). 

Step 2: Find gt G Vn such that minimizes the weighted sum error 


et = Pri^^(t)[gt{xi) ^ yt] = 

1=1 


for misclassified samples. 
Step 3: Choose 



and update weights 


,(t) 


w- = 


f^xj) 


Zt 


for all samples, where Zf = 2[et{l — is a normalization factor. 

Step 4: Add to ensemble ft = {l — + cttgt. 

Step 5: Increase t by one and repeat Step 2 and Step 3 if t < T. 


5. Experiments and analysis 

In this part, we firstly introduce the experimental settings, including the 
data sets, evaluation metrics and computational environment. Secondly, the 
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impact of the extracted features are analyzed. Then, we compare the perfor¬ 
mance of RAdaBoost with three other benchmarked methods such as SVM, 
kNN and AdaBoost on diverse 4 attack detection methods to demonstrate 
the outperformance of RAdaBoost. Finally, the remaining 10 types of attacks 
are conducted by means of RAdaBoost to further evaluate its performance. 

5.1. Experimental settings 

In our experiments, we use the MovieLens-100K|^ dataset as the data set 
describing the behaviors of genuine users in recommender system. MovieLens- 
lOOK was collected by the GroupLens Research Project at the University of 
Minnesota. It is the one of the most popular data sets used by researchers 
and developers in the field of collaborative filtering and attack detection in 
recommender systems. It consists of 100,000 ratings on 1682 movies by 943 
raters and each rater had to rate at least 20 movies. All ratings are in the 
form of integral values between minimum value 1 and maximum value 5. 
The minimum score means the rater distastes the movie, while the maxi¬ 
mum score means the rater enjoyed the movie. According to the information 
derived from MovieLens website, the sparse ratio of the rating data approx¬ 
imates to 93.7% and the average rating of all users is around 3.53. Besides, 
the Average Number of Items Rated (ANIR) by each user is approximately 
7%. Attack profiles are generated according to different attack models (as 
shown in Table 2). The attack profiles indicate the attackers intention that 
he wishes a particular item can be rated the highest or lowest rating. In 
this paper, we just detect the nuke attacks and the push attacks can be de¬ 
tected in the analogous manner. For each attack model, we generate nuke 
attack profiles according to the corresponding attack models with different 
attack sizes {1.1%, 6.4%, 11.7%, 17.0%, 22.3%, 27.6%} and filler sizes {1.2%, 
4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. To ensure the rationality of the results, 
the target item is randomly selected for each attack profile. In addition, for 
bandwagon attacks, we select movies {50, 56, 100, 127, 174, 181, 258, 286, 
288, 294} as the popular movies which are rated by more than 300 users in 
the system. In segment attack, we use movies {50, 183, 185, 200, 234, 443} 
as the segmented movies [16]. And for Reverse Bandwagon attack, we ran¬ 
domly choose 10 movies as the selected movies which are rated by one user 


^http: //grouplens.org/datasets/movielens / 

^The ratio between the number of ratings and entire ratings in the rating matrix. 
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in the system. For training set, we use the whole MovieLens-lOOK dataset to 
generate a attack prohles by exploiting 7 representative known attack models 
(random, average, bandwagon (average), segment, reverseB and wagon, PIA- 
ID and PUA-NR) with 17.0% attack size (160 attackers) and diverse filler 
sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. And then, we combine these 
7 attack datasets into MovieLens-lOOK dataset to construct a mixed user 
profiles as our training data. Thus, the training dataset consists of 943 gen¬ 
uine users and 1120 (160 x 7) attackers. For test data sets, based on the whole 
MovieLens-lOOK dataset, we generate respectively attack profiles by exploit¬ 
ing 14 attack models with different attack sizes (1.1%, 6.4%, 11.7%, 17.0%, 
22.3%, 27.6%) and filler sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. And 
then, the generated attack profiles are respectively inserted into genuine pro¬ 
files to construct our test datasets. Therefore, we have 504 (14 x 6 x 6) test 
datasets including 14 attack models, 6 different attack sizes and 6 different 
filler sizes. 

To measure the effectiveness of the proposed detection methods, we use 
three metrics such as classification error, detection rate and false alarm rate 
in the test sets, which are used in similar experiments j5]. Classification error 
is defined as the number of misclassifications divided by the number of all 
test user profiles. 


^Misclassifications 

classification error — ——-———- 

#User Profiles 


(5.1) 


Detection rate is defined as the number of detected attack profiles divided 
by the number of attack profiles. 


^Detection 

detection rate = — -—-——— 

#Attack Profiles 


(5.2) 


False alarm rate is the number of genuine profiles that are predicted as attack 
profiles divided by the number of genuine profiles. 


#False alarm 

falsealarm rate = -———— 

^Genuine Profiles 


(5.3) 


All numerical studies are implemented using MATLAB R2014a on a Win¬ 
dows personal computer with Core(TM) i7-3770 3.40GHz CPUs and RAM 
16.00GB. 
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5.2. Impact of extracted features 
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Figure 2: Relationship between the number of extracted features and the 
performance with respect to the filler size. Attack size is 17.0%. 
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Figure 3: The diagrams of clustering resnlts with diverse feature employed, 
where red nodes denote attackers and green nodes denote genuine nsers. 
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To evaluate the impact of the extracted features, we conduct a list of 
experiments in several attack models with diverse filler sizes as Figure 2 
illustrated. We utilize EM (Expectation-maximization) clustering method 
(Clustering results and EM clustering method were created using Weka|^ to 
separate attackers from genuine users as far as possible based on 10 features 
(generic and type-specific features), 15 features (additional 5 features based 
on filler size) and 18 features (all aforementioned features including 3 our 
proposed features), respectively, in order to analyze the relationship between 
the number of extracted features and the performance with respect to the 
filler size. Just as shown in Figure 2, Bandwagon (average), Segment, Reverse 
Bandwagon and PIA-AS attacks are taken for examples. It is distinctly 
observed from the results that the false alarm rate significantly decrease 
with using more extracted features. Furthermore, we take two diagrams 
to intuitively show the clustering results as shown in Figure 3 (Bandwagon 
(average) and Segment attacks are taken for examples). By fixing attack 
size (17.0%, 160 attackers) and filler size (13.3%, 170 items), the strikingly 
clustering results illustrate that hard classification task becomes easier to 
perform with more well-designed features employed. 

5.3. Experimental results and analysis 

First, we compare the detection performance of RAdaBoost with three 
benchmarked methods such as SVM, kNN and AdaBoost on 4 test attack 
profiles described above to validate the outperformance of RAdaBoost. The 
details of setting of each method is described as follows: 

• SVM: LibSVM and the default parameters are employed as [Ij for train¬ 
ing binary profile classifier with Prediction = J-l if classified as authentic 
and Prediction = — 1 if classified as attack. To classify unseen test data 
sets, the trained SVM model (or classifier) in the training set are used to 
determine the class label. 

• kNN: Standard kNN algorithm is used as [23! • The k nearest neighbors 
[k is chosen by 5-folds cross validation on the training data set) in the training 
set are collected for prediction using one over Pearson correlation distance 
weighting. 

• AdaBoost: We utilize decision stumps (with the number of splits J = 1) 
to build up the week learners for classification. The number of iterations (or 


® http: / / WWW . cs. waikat o. ac. nz / ml/weka / 
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the number of stumps to be fitted) is also selected via 5-folds cross validation 
on the training data set. 

• RAdaBoost: For additional shrinkage degree parameter, Sk = 2 /{k -\- 
u),u e N, in RBoosting, we create 20 equally spaced values of u in logarith¬ 
mic space between 1 to 10® and select the appropriate u* as [25] • The other 
settings are the same as Boosting. 

Fig.4-Fig.7 illustrate the performance surfaces of the RAdaBoost classi¬ 
fier for the aforementioned test sets, which contain 4 attack models (take 
AOP, Bandwagon (random), PIA-NR and PUA-NR attacks for examples) 
with different attack sizes (1.1%, 6.4%, 11.7%, 17.0%, 22.3%, 27.6%} and 
filler sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. For comparison, the 
performance surfaces of SVM, kNN and AdaBoost are also presented. It can 
be easily observed from these figures that, the classification error of SVM rise 
with the increase of the attack size, which implies the SVM classifier could 
not effectively classify and detect attacks generated by these 4 types of attack 
models when the total number of attacks are far smaller than genuine users. 
Although SVM can achieve fairly high classification performance within some 
small attack size areas, the detection rates of SVM are also small. It shows 
that the high prediction accuracy is almost produced by abundant genuine 
users but fail to capture the little concerned attackers. In our 4 sets of exper¬ 
iments, only a few bandwagon (random) attack profiles could be detected by 
SVM and naturally SVM barely false alarmed. kNN essentially outperforms 
SVM in our 4 types of attack detection methods with lower classification er¬ 
ror, much more higher detection rate and pimping false alarm rate. However, 
we also notice that the classification performance of kNN is still poor and it 
may fail for detection within some certain attack and filler size areas. Just as 
figures showed, kNN fail to detect AOP, PIA-NR attacks when the filler size 
is too small and bandwagon (random) attacks when the attack size is small. 
And for PUA-NR attack, kNN is just slightly better than SVM and barely 
detected. For AOP attack detected by kNN, the results may indicate that 
some genuine profiles are misclassified as attack profiles since a large number 
of genuine profiles have the same or similar number of popular items as the 
AOP attack profiles when filler size is too big and small. Compare with SVM 
and kNN, Boosting significantly improve the classification performance ow¬ 
ing to it iteratively strengthen the correction of the misclassifications. And 
hence, AdaBoost further enhance detection rate with very low false alarm 
rate. However, just as figures shown, although AdaBoost can effectively de¬ 
tect attack over a wide range of attack and filler size, we also observe that its 
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failure within some certain areas (i.e., AOP and PIA-NR attacks with small 
attack size and high filler size, bandwagon (random) attacks with small at¬ 
tack and filler size). Especially, AdaBoost can not effectively detect PUA-NR 
attacks within a large high attack size area. As figures shown, RAdaBoost 
additionally improve the classification performance of AdaBoost by imposing 
a re-scale operator and consequence enhance detection rate with negligible 
false alarm rate in the 4 types of attacks. So far, all the comparative ex¬ 
perimental results illustrate that the RAdaBoost outperforms Boosting and 
conventional supervised learning based detection methods including SVM 
and kNN. Finally, to further evaluate the effectiveness of RAdaBoost, we 
also conduct other 10 types of attacks to show the performance surfaces of 
RAdaBoost just as Fig.8 illustrated. From results, we can distinctly observe 
that, except for PUA-AS and PUA-ID attacks, RAdaBoost can effectively 
detect all of the attacks with almost no false alarm. Although it also shows 
low detection rates for some attacks with small attack and filler size. Com¬ 
paring with previous research results ISIESIISO], the detection performance 
of RAdaBoost is more optimistic. For PUA-AS and PUA-ID attacks, which 
are recently published attack models and few researchers pay close attention 
to them. Just as figures shown, the RAdaBoost can not effectively detect 
such attacks mainly because the present extractive features (as described in 
section 4) are not enough to depict their material characteristics. Therefore, 
the results indicate the adaptive new classification features are needed for 
detecting such new attacks as PUA-AS, PUA-NR and PUA-ID. 

6. Conclusion and further discussions 

“Shilling” attacks or “profile injection” attacks are serious threats to the 
collaborative filtering recommender systems (CFRSs). Since the number of 
detected attackers is far smaller than genuine users. Conventional super¬ 
vised learning based detection methods have the challenges faced with this 
imbalanced classification. In the present paper, we improved the detection 
performance in two directions. First, we extracted features from user profiles 
based on the statistical properties of the diverse attack models to make them 
much more easily classified. Then, we applied a variant of Boosting algo¬ 
rithm, called the re-scale AdaBoost (RAdaBoost) as our detection method, 
which gradually increasing emphasis on concerned attacks and could dis¬ 
tinctly improve the predictive performance on a difficult classification task. 
And all our experimental results also demonstrated the outperformance of 
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RAdaBoost in “shilling” attacks detection. 

In our future work, we will explore more simpler and effective features to 
characterize attack profiles from different perspectives. The existing features 
based on basic description statistics and model-specific are difficult to fully 
discriminate between attack profiles and genuine profiles in diverse attack 
models. In addition, some features based on global calculating similarity 
such as DegSim (similarity with top neighbors) are unrealistic in mass user 
profiles, although they are effective to capture the concerned attack profiles. 
Therefore, how to extract local and effective features from user profiles is 
still an open issue. 
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Figure 4: The classification error, detection rate and false alarm rate of RAd- 
aBoost on different test sets in comparison with SVM, kNN and AdaBoost. 
AOP attack with diverse attack sizes {1.1%, 6.4%, 11.7%, 17.0%, 22.3%, 
27.6%} and filler sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. 
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Figure 5: The classification error, detection rate and false alarm rate of RAd¬ 
aBoost on different test sets in comparison with SVM, kNN and AdaBoost. 
Bandwagon (random) attack with diverse attack sizes {1.1%, 6.4%, 11.7%, 
17.0%, 22.3%, 27.6%} and filler sizes {1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 
16.4%}. 
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Figure 6: The classification error, detection rate and false alarm rate of RAd- 
aBoost on different test sets in comparison with SVM, kNN and AdaBoost. 
PIA-NR attack with diverse attack sizes {1.1%, 6.4%, 11.7%, 17.0%, 22.3%, 
27.6%} and filler sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. 
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Figure 7: The classification error, detection rate and false alarm rate of RAd¬ 
aBoost on different test sets in comparison with SVM, kNN and AdaBoost. 
PUA-NR attack with diverse attack sizes {1.1%, 6.4%, 11.7%, 17.0%, 22.3%, 
27.6%} and filler sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. 
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Figure 8: The detection rate and false alarm rate of RAdaBoost in 10 different 
attack models with diverse attack sizes {1.1%, 6.4%, 11.7%, 17.0%, 22.3%, 
27.6%} and filler sizes (1.2%, 4.2%, 7.3%, 10.3%, 13.3%, 16.4%}. 
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